AMENDMENT TO THE CLAIMS 



L (Currently amended) A computer readable storage media - storing metefeghinstructions 
readable by a computer which, when implemented, cause the computer to resolve an 
overlapping ambiguity string in an input sentence of an unsegmented language by performing 
steps comprising: 



segmenting the sentence into two possible 
segmentations; 

recognizing the overlapping ambiguity string in the input sentence as a function of 
the two segmentations; 

obtainin g probabi li ty information based on at least _ one ee^ 



out puttin g an indication for selecting one of the two segmentations 

as a function of the obtained probability information^ : ^Ae™4wo 



2. (Currently amended) The computer readable aredfem- storage media of claim 1, wherein 
^Hk^ the probability information comprises obtaining probability 

information from a language model based on the a t least one context fmm^ mdjdA^m 



3. (Currently amended) The computer readable medfam-storage media of claim 2 wherein 






the langua ge moddte aeaHate 3 ' 



em comprises a trigram model 



4. (Currently amended) The computer readable mefem-storage media of claim 2 wherein 
outputting an indicat ion for selecting one of the two segmentations comprises classifying the 
probability information. 



5. (Currently amended) The computer readable mefem- storage media of claim 4 wherein 
classifying comprises classifying using Naive Bayesian Classification, 

6. (Currently amended) The computer readable medfam-storage media of claim 1 wherein 
segmenting the sentence comprises performing a Forward Maximum Matching (FMM) 
segmentation of the input sentence and a Backward Maximum Matching (BMM) 
segmentation of the input sentence. 

7. (Currently amended) The computer readable medfafR- storage media of claim 6 wherein 
recognizing the overlapping ambiguity string comprises recognizing a segmentation O f of the 
overlapping ambiguity string from the FMM segmentation and a segmentation O b of the 
overlapping ambiguity string from the BMM segmentation. 

8. (Currently amended) The computer readable medi^ -storag e media o f claim 7 wherein 
selecting one of the two segmentations is a function of a set of context features associated 
with the overlapping ambiguity string. 

9. (Currently amended) The computer readable medamHStoragg media of claim 8 wherein 
the set of context features comprises words around the overlapping ambiguity string. 

10. (Currently amended) The computer readable taedk»K>torage media of claim 8 wherein 
selecting one of the two segmentations comprises classifying the probability information of 
the set of context features and O f • 

1 1 . (Currently amended) The computer readable med4^mvstorage media of claim 10 wherein 
selecting one of the two segmentations comprises classifying the probability information of 
the set of context features and O h . 



12. (Currently amended) The computer readable mefetm-storage media of claim 8 wherein 
selecting comprising determining which of O f or O h has a higher probability as a function 

of the set of context features. 

13. (Currently amended) The computer readable raeferafr-storagc media of claim 1 wherein 
the unsegmented language is Chinese. 

14. (Currently amended) A method of segmentation of a sentence of an unsegmented language, 
the sentence having an overlapping ambiguity string (OA3)rfte-raetfee^ 

ef: 

generating a Forward Maximum Matching (FMM) segmentation of the sentence; 
generating a Backward Maximum Matching (BMM) segmentation of the sentence; 
recognizing an OAS as a function of the FMM and the BMM segmentations; 
obtainin g probability information based o n at least one context feature and at least 

part of the recognized OAS for each of the F MM and BMM; and 
outputting an indication for selecting one of the FMM segmentation and the BMM 

segmentation as a function of obtained probability information. 

15. (Currently amended) The method of claim 14 wherein fte~5k^ 

includes selecting one d6teffi«y^-a-^ babili% a s s ociat ed with each -of the FMM segmentation 
of the overlapping ambiguity string and the BMM segmentation of the overlapping ambiguity 
string . based on higher p robability, 

16. (Currently amended) The method of claim 141S- wherein obtaining d^temBm^r-^ 
probability p mbaMyhfes-information comprises using an N-gram model 



17. (Currently amended) The method of claim 16 wherein obtaining probability 
information detemi^^ comprises obtaining usfeffr-probability information 
about a first word of the overlapping ambiguity string. 

18. (Currently amended) The method of claim 17 wherein deiemimmg — the 
pgebabitiitS^s obtaining probability information comprises using probability information about a 
last word of the overlapping ambiguity string, 

19. (Currently amended) The method of claim 16 wherein obtainin g probability in formation 
comprises using the N-gram model comprises using information about context words around the 
overlapping ambiguity string. 

20. (Previously presented) The method of claim 16 wherein using the N-gram model comprises 
using information about a string of words comprising a first word of the overlapping ambiguity 
string and two context words to the left of the first word. 

21. (Previously presented) The method of claim 20 wherein using the N-gram model comprises 
using information about a string of words comprising a last word of the overlapping ambiguity 
string and two context words to the right of the last word. 

22. (Currently amended) The method of claim 15 wherein seteefeg-out putting includes 
using Naive Bayesian Classifiers. 

23. (Original) The method of claim 14 and further comprising receiving information from a 
lexical knowledge base comprising a trigram model. 

24. (Original) The method of claim 23 and further comprising receiving an ensemble of Naive 
Bayesian Classifiers. 



25. (Currently amended) A method of constructing information to resolve overlapping 
ambiguity strings in an unsegmented language comprising4he-^ep6^: 

recognizing overlapping ambiguity strings in a training data; 
replacing the overlapping ambiguity strings with tokens; 

generating an N-gram language model comprising information on constituent 
words of the overlapping ambiguity strings and co ntext features 
surrounding the overlapping am biguity strings. 

26. (Original) The method of claim 25 wherein generating the N-gram language model 
comprises generating a trigram model 

27. (Original) The method of claim 25 and further comprising generating an ensemble of 
classifiers as a function of the N-gram model. 

28. (Original) The method of claim 25 wherein recognizing the overlapping ambiguity strings 
comprises: 

generating a Forward Maximum Matching (FMM) segmentation of each sentence 

in the training data; 
generating a Backward Maximum Matching 
(BMM) segmentation of each sentence in the training data; 
recognizing an OAS as a function of the FMM and the BMM segmentations of 

each sentence in the training data. 

29. (Original) The method of claim 28 and further comprising generating an ensemble of 
classifiers as a function of the N-gram model 



30. (Previously presented) The method of claim 29 wherein generating the ensemble of 
classifiers includes approximating probabilities of the FMM and BMM segmentations of each 
overlapping ambiguity string as being equal to the product of individual unigram probabilities of 
individual words in the FMM and BMM segmentations respectively, of the overlapping 
ambiguity string, 

31. (Previously presented) The method of claim 30 wherein generating the ensemble of 
classifiers includes approximating a joint probability of a set of context features conditioned on 
an existence of one of the segmentations of each overlapping ambiguity string as a function of a 
corresponding probability of a leftmost and a rightmost word of the corresponding overlapping 
ambiguity string. 



